Systematic Entomology — Latest Matching Preprints

1

Global delimitation of Cyanoboletus, Cacaoporus and Cupreoboletus (Basidiomycota: Boletaceae)

Oliveira, P.; Mariquito, R.

2026-05-14 evolutionary biology 10.64898/2026.05.12.724631 medRxiv

Top 0.1%

6.9%

Show abstract

This investigation aimed at compiling all phylogenetic lineages within and around genus Cyanoboletus. The evolutionary inference obtained from the nuclear ribosomal genes internal transcribed spacer region (ITS) suggests that part of the species currently classified in Cyanoboletus belong in lineages separate from the genus, thus suggesting a narrower boundary that includes only the species that develop a strong staining reaction to touch and to air exposure of the context. The separate lineages are the monotypic Cupreoboletus genus and a few species that do not develop such reaction, which are part of a clade together with genera Cacaoporus and Acyanoboletus, thus broadening the concept of Cacaoporus to encompass all of them. The emerging 3C perspective of Cupreoboletus, Cacaoporus and Cyanoboletus offers a remarkably consistent morphological diagnosis, overcoming the problems of a too broad concept for Cyanoboletus. This work reveals that Boletus neotropicus, B. novae-zelandiae and B. sensibilis belong respectively in Cyanoboletus, Cacaoporus and Lanmaoa, and by studying multigene alignment concatenates it identifies lineages that probably represent undescribed species: at least four in Cacaoporus and at least five in Cyanoboletus. Diagnostic tables and dichotomic keys are presented by geographic region. The present work also includes a study of the phylogenetic position of Neoboletus flavosanguineus, a species once classified in Cyanoboletus. The complexity of assigning species epithets in some lineages is addressed, namely for the boundaries between Cacaoporus instabilis and Ca. fagaceophilus as well as the diversity under the names Cyanoboletus sinopulverulentus and Cy. pulverulentus. The overall picture of evolutionary lineages sets a framework for the choice of reference data that can provide, in future phylogenetic studies that involve the 3C, a balanced and efficient coverage. Graphical abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=197 SRC="FIGDIR/small/724631v1_ufig1.gif" ALT="Figure 1"> View larger version (23K): org.highwire.dtl.DTLVardef@7f618corg.highwire.dtl.DTLVardef@dd6a14org.highwire.dtl.DTLVardef@5f7399org.highwire.dtl.DTLVardef@9e7443_HPS_FORMAT_FIGEXP M_FIG C_FIG

2

First record of the subfamily Eucerotinae (Hymenoptera: Ichneumonidae) from the mainland Afrotropics, with a description of a new species

Hopkins, T.; Nascimento, A.; Santos, B. F.; Hovorka, T.; Sääksjärvi, I. E.; Österman, E. M.

2026-05-14 zoology 10.64898/2026.05.11.724332 medRxiv

Top 0.1%

3.7%

Show abstract

The ichneumonid subfamily Eucerotinae has been thought to be almost absent from the tropics, with the only known Afrotropical species found in Madagascar. We report the subfamily to be present in the mainland Afrotropics, and describe a new species, Euceros species 1 from Uganda and Cameroon (name not yet shown in preprint). The subfamily had likely not been observed in the mainland Afrotropics before due to low abundances and insufficient sampling. More Eucerotinae likely remain to be discovered in tropical Africa and Asia, although tropical America may genuinely have few eucerotine species. Much more extensive sampling will be needed before it is possible to make confident estimates of how eucerotine diversity is distributed globally.

3

155 years after Van Beneden: redescription and first molecular characterisation of the enigmatic type species, Ascarophis morrhuae Van Beneden, 1870 (Nematoda, Cystidicolidae), and comparison to other Ascarophis species in the North Atlantic

Appy, R. G.; Vanhove, M. P. M.; MacKenzie, K.; Hernandez-Orts, J. S.; Kmentova, N.

2026-04-17 zoology 10.64898/2026.04.15.718624 medRxiv

Top 0.1%

1.7%

Show abstract

Nematodes belonging to the Cystidicolidae Skrjabin, 1946 constitute more than 23 genera of 111 recognized species in fish from many habitats including the deep-sea, continental shelves, estuarine and freshwater habitats. The taxonomy of many species within the Cystidicolidae is unsettled due to their small size and correspondingly small morphological characters requiring use of scanning electron microscopy and supported more recently by molecular studies. The type species, Ascarophis morrhuae Van Beneden, 1870, which belongs to one of the first described and most speciose cystidicolid genera with 46 species, is based on a two-sentence description of a single female specimen from an Atlantic cod, Gadus morhua, presumably captured off the coast of Belgium in the North Sea (Van Beneden, 1870). New material was collected/examined from Atlantic cod and haddock, Melanogrammus aeglefinus, from Iceland and the North Sea and specimens present in the Natural History Museum, London were also studied. Based on these materials, A. morrhuae is morphologically redescribed and the first DNA sequences of this species are provided, it is differentiated from other Ascarophis species present in the North Atlantic and previous records are reviewed. This information provides a foundation for taxonomic and phylogenetic reconsideration of all cystidicolid nematodes and related families.

4

Comparative morphology of silk-spinning systems in amphipods

McKim, S.; Turner, T. L.

2026-05-12 evolutionary biology 10.64898/2026.05.07.723571 medRxiv

Top 0.1%

1.7%

Show abstract

Silk glands have been found in two groups of amphipods: the Corophiida and the Ampeliscidae. The silk glands in Ampeliscidae, however, have yet to be examined in detail. Here we report, for the first time, the morphology and distribution of pereopodal glands in the Ampeliscidae, in non-thread producing Synopiidae, and in the Paragammaropsidae. In the Ampeliscidae we found two gland types distributed throughout all pereopods which have the ability to create threads. Pereopods three and four have additional silk extrusion morphology at the tip of the dactylus in which silk is transformed into semi-cylindrical threads used for building domiciles. Synopiid outgroup species have one of the gland types but lack silk extrusion morphology. Using ancestral state reconstruction analysis, we find that glands in the Synopiidae are likely ancestral and hypothesize that silk glands in Ampeliscidae are derived from these ancestral glands. Silk-spinning pereopods in the Paragammaropsidae had similarities with both Corophiida and Ampeliscidae but had distinctions. Ampeliscidae silk-spinning systems bear surprising resemblance to the Corophiida which presents one to reconsider the taxonomic placement of Ampeliscidae and the origins of silk-spinning in amphipods. This is the first comprehensive study on the glandular systems of Ampeliscidae, Synopiidae, and Paragammaropsidae using advanced microscopy, providing pertinent morphological data to the study of arthropod silk gland evolution and complex traits.

5

Automated landmark and semilandmark annotation for wing geometric morphometrics in Diptera using deep learning

Nolte, K.; Baumbach, J.; Kollmannsberger, P.; Sauer, F. G.; Luehken, R.

2026-04-21 bioinformatics 10.64898/2026.04.17.719146 medRxiv

Top 0.1%

1.2%

Show abstract

1. Diptera represent a diverse insect order, including vectors of human and animal pathogens. Their accurate species identification remains a major bottleneck in ecological and epidemiological studies. Morphological identification requires taxonomic expertise, while molecular methods are costly and not universally reliable. Wing geometric morphometrics offers an alternative, but manual landmark annotation is time-consuming and introduces observer bias. 2. We developed ITHILDIN, an automated pipeline for landmark and semilandmark annotation of Diptera wings, combining UNet++ segmentation and an Hourglass landmark prediction model. Using mosquitoes as the primary model system, we extended an existing repository with 5,793 additional images. Models were trained on 5991 annotations of landmarks and segmentations and then evaluated on 12,522 images across 34 taxa. We assessed landmark prediction accuracy against human observers and ML-morph, evaluated species identification using Linear Discriminant Analysis on 17 homologous landmarks and 52 semilandmarks, and tested out-of-distribution generalisation by reproducing an independent study. Transferability was demonstrated by adapting the pipeline to the Dipteran families Drosophilidae and Glossinidae. 3. The Hourglass model achieved a mean landmark error of 4.5 pixels (95% CI: 4.3-4.6), within human observer variability (4.7 pixels, 95% CI: 4.4-5.0) and substantially outperforming ML-Morph (12.7 pixels, 95% CI: 11.1-14.2). The semilandmark-based approach for species identification achieved 91% balanced accuracy across 34 taxa, comparable to CNN performance (94%). On out-of-distribution data, the landmark pipeline generalised substantially better than the CNN and a soft-voting ensemble of the landmark and CNN classifiers achieved 88% balanced accuracy on a replicated study. 4. Combining geometric morphometrics with deep learning provides a reproducible, interpretable, and generalisable alternative to black-box CNN classifiers for Diptera wing analysis. By acting as a consistent single observer comparable to human annotation, the system eliminates inter-observer bias, enabling large-scale and cross-study morphometric analyses of Dipteran wings. The system is publicly available at www.ithildin.bnitm.de and transferable to other Diptera families with moderate retraining effort. Data availabilityImages used in this study are accessible under CC BY 4.0 license at https://doi.org/10.6019/S-BIAD1478. Downloadable and installable docker application can be accessed on the applications git page: https://anonymous.4open.science/r/ITHILDIN-4313/

6

TaxonMatch: taxonomic integration and tree construction from heterogeneous biological databases

Leone, M.; Rech De Laval, V.; Drage, H. B.; Waterhouse, R. M.; Robinson-Rechavi, M.

2026-03-20 evolutionary biology 10.64898/2026.03.18.712418 medRxiv

Top 0.1%

1.2%

Show abstract

Integrating taxonomic data from various sources presents a significant challenge in the study of biodiversity research, due to non-standardized nomenclature and evolving species classifications. Discrepancies between major repositories like the Global Biodiversity Information Facility (GBIF) and the National Center for Biotechnology Information (NCBI), as well as citizen science platforms such as iNaturalist, lead to fragmented and sometimes inaccurate biological data. We present TaxonMatch, a tool designed to address these challenges. TaxonMatch aligns taxonomic names, resolves synonymy, and corrects typographical and structural inconsistencies across databases. We show how it can be used to build a common backbone arthropod taxonomy over NCBI, GBIF and iNaturalist, to find the closest molecular data to a given fossil, and to identify IUCN endangered species with molecular data. TaxonMatch provides a cohesive taxonomic framework and a consistent taxonomic backbone, and can be applied to any taxonomic source. The tool is available at https://github.com/MoultDB/TaxonMatch.

7

Phylogenomics of the mega genus Bulbophyllum (Orchidaceae) and implications for its infrageneric classification

Nanjala, C.; Simpson, L.; Hu, A.-Q.; Patel, V.; Nicholls, J. A.; Bent, S. J.; Gale, S. W.; Fischer, G. A.; Goedderz, S.; Schuiteman, A.; Crayn, D.; Clements, M. A.; Nargar, K.

2026-04-01 evolutionary biology 10.64898/2026.03.30.715161 medRxiv

Top 0.1%

1.0%

Show abstract

Understanding evolutionary relationships in hyperdiverse plant groups remains a major challenge in systematics. The orchid genus Bulbophyllum, the second largest genus of flowering plants, represents an exceptional example of phylogenetic and morphological complexity. Relationships, particularly within the species-rich Asian clade, have remained poorly resolved due to extensive morphological variation and limited resolution in previous phylogenetic studies. Here, we reconstructed phylogenetic relationships using 63 plastid genes from 355 specimens representing 322 species and 65 of the 97 recognised sections of Bulbophyllum. Our analyses confirmed that the genus comprises five major evolutionary lineages comprised of species predominantly from Australasia, Madagascar, Continental Africa, Neotropics, and Asia. We provide the first robust phylogenetic evidence for a dichotomous split within the Asian clade into two well-supported lineages: the Asian-Malesian clade and the Malesian-Papuasian clade, with the latter containing a strongly supported Papuasian subclade. Additionally, this study supports the monophyly of several currently recognised sections while clarifying relationships in previously problematic groups. This study provides the most comprehensive plastid-based phylogenomic framework for Bulbophyllum to date and establishes a foundation for future taxonomic revision and integrative analyses of diversification and trait evolution within this hyperdiverse genus.

8

Reassessing display behavior from Bels et al. (2025) given the complexity of anthropogenic hybridization and intraspecific diversity in Iguana iguana

van den Burg, M. P.; Thibaudier, J.

2026-03-23 zoology 10.64898/2026.03.19.713079 medRxiv

Top 0.1%

0.8%

Show abstract

Understanding behavioral differences between non-native and closely related endangered species could be important to aid conservation management. In volume 169 of Zoology, Bels et al. (2025) reported on their comparison of display-action-patterns (DAP) between native Iguana delicatissima and non-native iguanas present on islands of the Guadeloupe Archipelago in the Caribbean Lesser Antilles. Here, we address conceptual and methodological concerns about their work and reanalyze their data given our proposed corrections, primarily a literature-informed adjustment of their "species" category. We additionally utilize online videos from South American mainland I. iguana populations, from where the non-native iguanas in the Guadeloupe Archipelago originate, to better understand the different DAPs between native and non-native iguanas in the Guadeloupe Archipelago. Significant differences in DAP characteristics among "species" categories (native I. delicatissima, non-native iguanas, and hybrids) show that Bels et al. (2025) oversimplified their data analyses by merging all non-native populations into one group. This result indicates the presence of behavioral variation among subpopulations within widely hybridizing iguanid populations, which has been poorly studied. Additionally, videos from mainland populations across two major mitochondrial clades of Iguana iguana show that non-native iguanas on Guadeloupe retained DAP characteristics of those populations from which they originate. We discuss these findings in light of the proposed hypotheses put forward by Bels et al. (2025), of which two can be excluded. Overall, our reanalysis shows that studies focusing on characteristics within settings of complex hybridization in diverse species should acknowledge this complexity.

9

Posterior simulation-based calibration tests of phylogenetic dating methods

King, B.

2026-04-16 evolutionary biology 10.64898/2026.04.14.718426 medRxiv

Top 0.1%

0.8%

Show abstract

Simulation-based calibration (SBC) checking is a method to ensure that the inference machinery for a Bayesian statistical analysis is functioning in a correct and unbiased manner. Typically, SBC begins with sampling parameter values from the model priors (prior SBC). However, it has been shown that prior SBC can miss problems when these manifest only in certain regions of parameter space. In phylogenetics, this is relevant not only because of the vastness of tree and parameter space, but also because many phylogenetic analyses involve some degree of model misspecification. Posterior SBC is a recently developed method for checking that the inference algorithms function correctly for a given empirical dataset. Here I use posterior SBC to test the implementation of phylogenetic dating methods in the inference software BEAST 2. I test both the tip-dated approach, employing an Indo-European vocabulary dataset, and the node-dated approach, employing a molecular rRNA dataset of Tabanidae (horseflies). In both cases, posterior SBC tests indicate good calibration. Despite this, posterior predictive datasets simulated from the posterior distribution provided no further increase in the precision of node age estimates compared to the original posterior, a result consistent with previous literature showing fundamental theoretical limits to the identifiability of node ages. Nevertheless, these results suggest that phylogenetic dating methods in BEAST 2 are not biased by problems with the inference machinery, thereby increasing confidence in results obtained using these methods.

10

Resolving the oak tree of life: comparing RADseq and whole genome resequencing methods for oak phylogenetics

Hipp, A. L.; Althaus, K. N.; Fuller, E. L.; Hahn, M.; Larson, D. A.; Mohn, R. A.; Wang, B.; Manos, P. S.

2026-05-17 evolutionary biology 10.64898/2026.05.14.725274 medRxiv

Top 0.1%

0.8%

Show abstract

Forest trees pose numerous potential challenges to phylogenomic inference. Their large effective population sizes and relatively long generation times lead to deep allele coalescence and consequently incomplete lineage sorting (ILS), which biases inferences of divergence times toward older ages and introduces gene tree discordance. Deep phylogenetic divergences, reaching back into the Paleocene, introduce reference-mapping biases. Introgression--the movement of genes between lineages--may result in different phylogenies being inferred depending on which individuals are included in analysis, even if the plurality of the genome favors the divergence history unaffected by introgression. These factors influence phylogenetic inference across the Tree of Life but are particularly prevalent in forest trees. Oaks (Quercus) are notable for all three influences. In addition, our knowledge of the oak phylogeny is currently based strongly on restriction site associated DNA sequencing (RADseq) datasets published over the past decade, which may introduce additional sources of uncertainty. In this chapter, we analyze a 322-species RADseq dataset and genome resequencing data from across the genus to address sources of uncertainty in our understanding of the global oak phylogeny, which we hope will serve as a model for other research groups working on comparable woody plant groups.

11

Ancestral state reconstruction with discrete characters using deep learning

Nagel, A. A.; Landis, M. J.

2026-03-21 evolutionary biology 10.64898/2026.03.19.712918 medRxiv

Top 0.1%

0.7%

Show abstract

Ancestral state reconstruction is a classical problem of broad relevance in phylogenetics. Likelihood-based methods for reconstructing ancestral states under discrete character models, such as Markov models, have proven extremely useful, but only work so long as the assumed model yields a tractable likelihood function. Unfortunately, extending a simple but tractable phylogenetic model to possess new, but biologically realistic, properties often results in an intractable likelihood, preventing its use in standard modeling tasks, including ancestral state reconstruction. The rapid advancement of deep learning offers a potential alternative to likelihood-based inference of ancestral states, particularly for models with intractable likelihoods. In this study, we modify the phylogenetic deep learning software O_SCPLOWPHYDDLEC_SCPLOW to conduct ancestral state reconstruction. We evaluate O_SCPLOWPHYDDLEC_SCPLOWs performance under various methodological and modeling conditions, while comparing to Bayesian inference when possible. For simple models and small trees, its performance resembles the performance of Bayesian inference, but worsens as tree size increases. While O_SCPLOWPHYDDLEC_SCPLOW still performs adequately for more complex models, such as speciation and extinction models, the estimates differ more from Bayesian inference in comparison with simpler models. Lastly, we use O_SCPLOWPHYDDLEC_SCPLOW to infer ancestral states for two empirical datasets, one of the ancestral ranges of a subclade of the genus Liolaemus and ancestral locations for sequences from the 2014 Sierra Leone Ebola virus disease outbreak.

12

Assessing the potential of bee-collected pollen sequence data to train machine learning models for geolocation of sample origin

Hayes, R. A.; Kern, A. D.; Ponisio, L. C.

2026-04-01 bioinformatics 10.64898/2026.03.29.715128 medRxiv

Top 0.1%

0.7%

Show abstract

Pollen is a robust and widespread substance that captures a historical snapshot of a specific time and place, and it can be used to track movements through space by examining the pollen deposited on various objects. Palynology, the study of pollen, is used across fields such as conservation, natural history, and forensics, where it is particularly useful for tracing the origin and movement of objects. However, pollen has remained underutilized due to the difficulty of distinguishing many pollen taxa beyond the family level and limited pollen reference material to support location predictions. With recent developments in pollen DNA metabarcoding these issues have been rectified, but much of the available pollen data are primarily from wind-pollinated species, which are widespread and less informative of specific sample locations. Bee-collected pollen presents an untapped resource in training predictive models to geolocate sample origin. Here we compiled bee-collected pollen DNA sequence relative abundance data from three projects in the western U.S. and assessed the accuracy of supervised machine learning models to predict the location of sample origin based solely on pollen assemblage, without the need of incorporating additional data. Random Forest and k-Nearest Neighbors models yielded high accuracy across all projects. We also found that models trained on taxonomically clustered pollen assigned sequence variants (ASVs) performed slightly better than those trained on raw sequence data, but the difference was minor, indicating that models trained on raw sequence data can reliably predict location and avoid the time-consuming taxonomic assignment process. Our results demonstrate the utility of repurposing bee-collected pollen for geolocation and provide a framework for employing supervised machine learning in future geolocation efforts. HighlightsO_LIBee-collected pollen metabarcoding data was used to accurately predict sample origin C_LIO_LIRandom Forest and k-Nearest Neighbors algorithms were most accurate with lowest error C_LIO_LITaxonomically-classified and raw DNA sequence data training sets performed comparably C_LI

13

Outperforming the Majority-Rule Consensus Tree Using Fine-Grained Dissimilarity Measures

Takazawa, Y.; Takeda, A.; Hayamizu, M.; Gascuel, O.

2026-03-18 bioinformatics 10.64898/2026.03.16.712085 medRxiv

Top 0.1%

0.6%

Show abstract

Phylogenetic analyses often require the summarization of multiple trees, e.g., in Bayesian analyses to obtain the centroid of the posterior distribution of trees, or to determine the consensus of a set of bootstrap trees. The majority-rule consensus tree is the most commonly used. It is easy to compute and minimizes the sum of Robinson-Foulds (RF) distances to the input trees. In mathematical terms, the majority-rule consensus tree is the median of the input trees with respect to the RF distance. However, due to the coarse nature of RF distance, which only considers whether two branches induce exactly the same bipartition of the taxa or not, highly unresolved trees can be produced when the phylogenetic signal is low. To overcome this limitation, we propose using median trees with respect to finer-grained dissimilarity measures between trees. These measures include a quartet distance between tree topologies, and transfer distances, which quantify the similarity between bipartitions, in contrast to the 0/1 view of RF. We describe fast heuristic consensus algorithms for transfer-based tree dissimilarities, capable of efficiently processing trees with thousands of taxa. Through evaluations on simulated datasets in both Bayesian and bootstrapping maximum-likelihood frameworks, our results show that our methods improve consensus tree resolution in scenarios with low to moderate phylogenetic signal, while providing better or comparable dissimilarities to the true phylogeny. Applying our methods to Mammal phylogeny and a large HIV dataset of over nine thousand taxa confirms the improvement with real data. These results demonstrate the usefulness of our new consensus tree methods for analyzing the large datasets that are available today. Our software, PhyloCRISP, is available from https://github.com/yukiregista/PhyloCRISP.

14

Atlantic and Indo-Pacific separation in Palythoa sibling species: phylogenomic analyses using ultraconserved elements

Hansen, L. A. J.; Santos, M. E. A.; Kise, H.; Zamora-Jordan, N.; Reimer, J. D.

2026-04-29 evolutionary biology 10.64898/2026.04.26.720863 medRxiv

Top 0.1%

0.6%

Show abstract

The delineation of closely related species remains a persistent challenge in Zoantharia, where morphological plasticity and limited genetic differentiation complicate taxonomy. In this study, we investigated the phylogenetic relationship between the widely distributed sibling taxa Palythoa tuberculosa (Indo-Pacific) and Palythoa caribaeorum (Atlantic) using ultraconserved elements (UCEs) recovered from genome skimming. A dataset comprising 116 loci (35,699 bp) across 37 specimens from Brazil, the Red Sea, Okinawa, and New Caledonia was analysed using both concatenated maximum-likelihood and coalescent-based approaches. Phylogenetic reconstructions did not recover monophyletic relationships corresponding to either species or geographic origin, instead revealing intermixed lineages across the Indo-Pacific and Atlantic regions. Concordance factor analyses indicated low gene concordance and moderate site concordance, suggesting pervasive gene tree discordance rather than a lack of phylogenetic signal. These patterns are consistent with previous studies based on mitochondrial, nuclear, and reduced-representation datasets, indicating that increased marker resolution does not resolve species boundaries within this complex. The observed lack of differentiation may reflect ongoing or recent connectivity among populations, potentially facilitated by long-distance dispersal promoted by anthropogenic rafting or historical range expansion, biological invasion, or biological processes such as incomplete lineage sorting. The results support the hypothesis that P. tuberculosa and P. caribaeorum represent a species complex or a case of incipient speciation rather than fully distinct evolutionary lineages. These findings indicate that genome-scale data alone may be insufficient to resolve very recent divergences, supporting the need for integrative approaches to resolve complicated species boundaries in zoantharians.

15

Long-distance dispersal drives global tropical distributions in a widespread moth lineage (Lepidoptera: Limacodidae)

Taberer, T. R.; Espeland, M.; Martin, S.; Coulson, T.; Clegg, S. M.

2026-05-18 evolutionary biology 10.64898/2026.05.16.724310 medRxiv

Top 0.1%

0.6%

Show abstract

Understanding how global biodiversity patterns arise is a central theme of biogeography, with contemporary theory recognising the roles of both dispersal and vicariance. Genera that are broadly distributed can provide important systems for disentangling the relative influence of these processes across evolutionary timescales. However, many lesser-studied groups, particularly those in the tropics, lack a densely sampled phylogeny which hinders robust inference of their evolutionary and biogeographic history. This study investigates the global diversification and systematics of the putative pantropical moth genus Parasa Moore (Lepidoptera: Limacodidae), with the aim of assessing the relative importance of dispersal and vicariance in shaping its distribution. Medium-coverage whole genome sequencing of specimens predominantly from museum collections were used to generate a globally sampled time-calibrated phylogeny of Parasa and associated genera (the Parasa-complex). Ancestral range estimation analyses were employed to infer geographical origins and possible dispersal times between bioregions. The Parasa-complex originated in Africa in the late Oligocene ([~]24 Ma) and, through a series of long-distance dispersal events during the early-mid Miocene, expanded into Asia ([~]23 Ma) and the Americas ([~]21 Ma). Across all regions, dispersal was the dominant process shaping present-day distributions, with a limited role of vicariance in some subregions. Phylogenetic analyses further demonstrated that Parasa is not monophyletic, with multiple independent lineages contributing to its apparent pantropical distribution. These findings highlight a central role of long-distance dispersal in generating certain global distributions. The results support a dynamic model of range evolution involving rapid Miocene dispersal and subsequent regional diversification. In addition, the non-monophyly of Parasa requires substantial taxonomic revision, underscoring the importance of robust phylogenetic frameworks for interpreting global biodiversity patterns.

16

Disparity analyses are robust to ancestral state estimation uncertainty

Scutt, C. N.; Cooper, N.; Thomas, G. H.; Guillerme, T.

2026-04-22 evolutionary biology 10.64898/2026.04.21.719166 medRxiv

Top 0.1%

0.6%

Show abstract

Morphological trait datasets and phylogenies are routinely paired to investigate macroevolutionary patterns during disparity analyses. However, incomplete fossil sampling can distort disparity estimates, obscuring true evolutionary signals. Ancestral state estimation can be used for both continuous and discrete traits to extend these analyses beyond incomplete fossil data, such as investigations into disparity through time. However, when ancestral state estimation occurs in the disparity pipeline, and the inevitable uncertainty in these estimates, complicate their integration. Determining the most robust workflow for integrating ancestral state estimation in disparity analyses remains a critical methodological challenge. Using simulations to attain a ground-truth disparity value, we evaluated different approaches to performing ancestral state estimation and incorporating uncertainty across varying continuous and discrete trait models, fossil sampling densities and disparity metrics. Ancestral state estimation generally improved recovery of true disparity relative to tip-only analyses, though the optimal approach depended on the interaction between trait model and fossil sampling density. For continuous traits, probabilistic approaches were most accurate, but were sensitive to model misspecification under low fossil sampling density. For discrete traits, pre-ordination methods were most reliable and probabilistic approaches outperformed point estimates under low sampling, while point estimates became increasingly accurate as sampling density increased. Fossil sampling density was a stronger predictor of disparity accuracy than estimation method choice, underscoring that methodologies are only as powerful as the data provided. Our findings offer a practical decision framework for selecting the most appropriate workflow given the sampling density and trait characteristics of a dataset.

17

k-Nearest Common Leaves algorithm for phylogenetic tree completion

Koshkarov, A.; Tahiri, N.

2026-04-04 evolutionary biology 10.64898/2026.04.02.716144 medRxiv

Top 0.1%

0.5%

Show abstract

Phylogenetic trees represent the evolutionary histories of taxa and support tasks such as clustering and Tree of Life reconstruction. Many established comparison methods, including the Robinson-Foulds (RF) distance, assume identical taxon sets. A methodological gap remains for trees with distinct but overlapping taxa. Existing approaches either prune non-common leaves, which can discard information, or complete both trees such that they share the same taxa. Completion is more comprehensive, but current methods typically ignore branch lengths, which are essential for identifying evolutionary patterns. This paper introduces k-Nearest Common Leaves (k-NCL), an algorithm for completing rooted phylogenetic trees defined on different but overlapping taxa. The method uses branch lengths and topological characteristics and does not rely on a specific distance measure. The k-NCL algorithm is designed to preserve evolutionary relationships in the trees under comparison. The running time is O(n2), where n is the size of the union of the two leaf sets. Additional properties include preservation of original distances and topology, symmetry, and uniqueness of the completion. Implemented in Python, k-NCL is evaluated on biological datasets of amphibians, birds, mammals, and sharks. Experimental results show that RF combined with k-NCL improves phylogenetic tree clustering performance compared to the RF(+) tree completion approach. Availability and implementationAn open-source implementation of k-NCL in Python and the datasets used in this study are available at https://github.com/tahiri-lab/KNCL.

18

Closely related, yet phenotypically different - Genome assemblies of two sister species of widow spiders: Latrodectus hasselti and L. katipo, Theridiidae

Ivanov, V.; Uludag, K. O.; Schöneberg, Y.; Schneider, J. M.; Kennedy, S.; Hamadou, A. B.; Vink, C. J.; Krehenwinkel, H.

2026-04-21 genomics 10.64898/2026.04.17.719154 medRxiv

Top 0.1%

0.5%

Show abstract

Widow spiders of the genus Latrodectus are important animals for biomedical, pest and conservation research. Here, we present the assembled genomes of two closely related Latrodectus species: the Australian L. hasselti and the New Zealand endemic L. katipo. The genome of L. katipo consists of 13 scaffolds likely corresponding to chromosomes (90% of the total length) and 1267 short scaffolds (10%). It has a total length of 1.5 Gbp and BUSCO of 94.9%. The genome of L. hasselti consists of 379 scaffolds and has a total length of 1.7 Gbp and a BUSCO score of 95.4%. The repeat content is very similar in both genomes with a total proportion of 37.2% for L. katipo and 39.9% for L. hasselti. Genome annotation predicted 12706 and 15111 genes for L. katipo and L. hasselti respectively. An ortholog analysis shows large overlap between orthogroups suggesting either duplication events in L. hasselti or loss of genes in L. katipo.

19

Estimating Bayesian phylogenetic information content using geodesic distances

Milkey, A.; Lewis, P. O.

2026-04-01 evolutionary biology 10.64898/2026.03.31.715656 medRxiv

Top 0.2%

0.5%

Show abstract

AO_SCPLOWBSTRACTC_SCPLOWA new Bayesian measure of phylogenetic information content is introduced based on geodesic distances in treespace. The measure is based on the relative variance of phylogenetic trees sampled from the posterior distribution compared to the prior distribution. This ratio is expected to equal 1 if there is no information in the data about phylogeny and 0 if there is complete information. Trees can be scaled to have the same mean tree length to avoid dominance by edge length information and focus on topological information. The method scales well, requiring only that a valid sample can be obtained from both prior and posterior distributions. We show how dissonance (information conflict) among data sets can also be estimated. Both simulated and empirical examples are provided to illustrate that the new approach produces sensible and intuitive results.

20

Inferring evolutionary relationships among Crenotia species (Bacillariophyta): Evidence from natural populations and monoclonal strains from Slovakia

Hindakova, A.; Urbankova, P.; Kulichova, J.

2026-04-15 evolutionary biology 10.64898/2026.04.13.718240 medRxiv

Top 0.2%

0.5%

Show abstract

Diatoms exhibit remarkable diversity in valve morphology, with the raphe system being a fundamental feature in classification of raphid pennate diatoms. The repeated loss of one of the two raphes during evolution has led to multiple independent origins of monoraphid diatoms. The phylogenetic affinities of the monoraphid genus Crenotia A. Z. Wojtal, erected from Achnanthidium thermale Rabenhorst, have not yet been clarified with molecular data. In this study, natural populations of Crenotia and monoclonal strains derived from them were examined using morphological observations and multilocus phylogenetic analyses based on nuclear and plastidial molecular markers. Three species of the genus Crenotia form a well-supported clade placed within a subgroup of monoraphid genera, which are closely related to Cymbellales D.G. Mann and other biraphid diatoms. This study establishes the first molecular framework for representatives of the genus Crenotia, demonstrating their monophyly and congruent interspecific relationships recovered with multiple molecular markers. The low intraspecific sequence variability and substantial interspecific divergence, together with clear morphological and ecological differentiation, support the recognition of the three investigated species.